1 Introduction

Determining individuals most susceptible to a disease allows productive resource allocation. For diseases such as Dementia, individuals both inherit risk factors and accrue them throughout life. No factor is causative on its own, but understanding what contributes to a high risk allows the public health sector to assess and prevent potential health crises at a population level (Baumgart et al. 2015). Dementia is a clinical syndrome characterized by difficulties in memory and language, psychological and psychiatric changes, and impairments in activities of daily life (Burns and Iliffe 2009). Dementia’s complex list of possible symptoms are reflected in its causes. Common origins of dementia can be degenerative neurological diseases such as Parkinson’s or Alzheimer’s; however vascular disorders in the brain, traumatic head injuries and some infections can lead to a dementia diagnosis.

The data used in analysis is attained from a longitudinal study of 150 participants. Participants were right-handed, either male of female and aged between 60 and 96. They were characterized as either nondemented, demented or converted (became demented throughout the course of the study). For each session, participants took part in T1 weighted MRI scans, the results of which are recorded in visit_data. Participants underwent 2 or more sessions, each separated by at least a year.

This workflow aims to look at two questions. What factors are associated with an increased risk of dementia and what factors are associated with an increased risk over time. It is important to note, no one determinant causes dementia. The profiles of two people characterized as suffering with dementia maybe completely different.

2 Methods

Workflow is produced with R, a statistical computing language, (R Core Team 2020) and R Markdown which generates this html report.(Allaire et al. 2020). The bookdown package is used to add features to R Markdown such as cross referencing (Xie 2016).

Data is imported using R, the tidyverse (Wickham et al. 2019) and readxl (Wickham and Bryan 2019) packages.

2.1 Data Description

Raw data is two excel sheets within the same spreadsheet dementia.xlsx. The first sheet, visit_data, contains information regarding visit numbers and MRI results. The second sheet, patient_data, has information on current dementia status, sex, and education and social status. Each row is one patient’s data at one given time. Replicate subject_IDs can be seen as some patients had data collected once a year over a course of multiple years. Explanations of each column can be seen in 2.1.

Table 2.1: Key Terms Table
Term Definition
MMSE Mini-Mental State Examination score (range: 0 = worst to 30 = best). A 30-point questionaire used to measure cognitive impairement. A score above 24 is considered normal. Lower scores may correlate with dementia although this is not true in every case.
CDR Clinical Dementia rating (0 = no impairment, 0.5 = questionable, 1 = mild, 2 = moderate, 3 = severe). A clinical tool that measures relative dementia symptoms based on 6 domains (memory, orientation, judgment and problem solving, community affairs, home and hobbies, and personal care)
eTIV Estimated total intracranial volume (mm3)
nWBV Normalized whole-brain volume (%)
ASF Atlas Scaling Factor (unitless)
M_F Patient sex, Female is represented by a 1, Male is represented by a 2
EDUC Years of Education
SES Socioeconomic status, assessed by Hollingshead Four Factor Index Of Social Status, measures the social status of an individual based on 4 domains: marital status, retired/employed status, educational attainment, and occupational prestige. A score of 1 indicates high status, while 5 indicates lowest status

2.2 Data Transformation

3 data sets were created with the raw data. Each one starts by merging visit_data and patient_data into one by subject ID. Post import and merging, data variable names are cleaned with the janitor (Firke 2020) package. From here they differ as described below:

  1. Dementia: used to look at which factors are associated with an increased risk of dementia. Columns not used in analysis (subject_id, visit, group and mri_number) and rows with NA values are removed. Finally, the values in m_f have been converted to numerical values for analysis (F = 1, M = 2).

  2. Dementia2: used to look at factors contributing to dementia over time. The aim of this data set was use in either a paired Student’s t-test or paired samples Wilcoxon test. In this data set, the cdr and mri_number columns and NA rows were removed. The rows are rearranged into visit number ascending order. The values in m_f have been converted to numerical values for analysis (F = 1, M = 2). The Nondemented and Demented rows from the group column are removed as these did not change over time. Only visits 1 and 2 are kept, for most subjects there was no data for visit number 3 or higher. OAS2_ are removed from the subject_id strings. Unique subject_ids were removed as they do not have pairs. Finally, the visit levels were ordered, purely to have start and end in order in the boxplots.

  3. Dementia_extract: used to generate some of the values used for inline reporting. In this data set all repeated subject_ids were removed so that accurate numbers about the number of participants could be recorded.

3 Which determinants correlate with increased CDR?

Using the dementia data set, this section looks at which determinants correlate with a high clinical dementia rating (CDR). In other words which determinants are linked with dementia. A list and explanation of the determinants used in this analysis can be seen in 2.1.

3.1 Scatter Plots

Plots are generated using ggplot2 from the tidyverse package (Wickham et al. 2019). Arrangement of plots into a grid was achieved using ggarrange from the ggpubr package (Kassambara 2020).
\label{fig:figs}Scatter Plots That Demonstrate Correlations Between Determinant And A High CDR

Figure 3.1: Scatter Plots That Demonstrate Correlations Between Determinant And A High CDR

3.2 Summary Table

Table generated using the kableExtra package (Zhu 2020).

Table 3.1: Dementia Summary Statitics Table
Determinant CDR Mean N Standard Deviation Standard Error Minimum Maximum
Age 0.0 77.1553398 206 8.0894478 0.5636185 60.000 97.000
Age 0.5 77.4363636 110 7.3015359 0.6961741 62.000 92.000
Age 1.0 74.3714286 35 6.8645968 1.1603286 61.000 96.000
Age 2.0 85.0000000 3 11.2694277 6.5064071 78.000 98.000
MMSE 0.0 29.2233010 206 0.9205729 0.0641394 25.000 30.000
MMSE 0.5 26.4636364 110 3.0400304 0.2898555 17.000 30.000
MMSE 1.0 20.3142857 35 5.2735267 0.8913887 4.000 30.000
MMSE 2.0 20.3333333 3 5.0332230 2.9059326 15.000 25.000
eTIV 0.0 1486.8592233 206 179.9986303 12.5410988 1106.000 2004.000
eTIV 0.5 1482.4545455 110 174.0359889 16.5936805 1143.000 1928.000
eTIV 1.0 1528.0000000 35 157.8443015 26.6805566 1274.000 1957.000
eTIV 2.0 1538.0000000 3 157.4452286 90.9010451 1401.000 1710.000
nWBV 0.0 0.7404515 206 0.0373497 0.0026023 0.644 0.837
nWBV 0.5 0.7205182 110 0.0345072 0.0032901 0.646 0.806
nWBV 1.0 0.6990571 35 0.0224564 0.0037958 0.657 0.756
nWBV 2.0 0.7066667 3 0.0503322 0.0290593 0.660 0.760
ASF 0.0 1.1971068 206 0.1405721 0.0097941 0.876 1.587
ASF 0.5 1.1995091 110 0.1365395 0.0130185 0.910 1.535
ASF 1.0 1.1600286 35 0.1146708 0.0193829 0.897 1.377
ASF 2.0 1.1490000 3 0.1146865 0.0662143 1.026 1.253
EDUC 0.0 15.1601942 206 2.7047506 0.1884489 8.000 23.000
EDUC 0.5 14.0090909 110 3.1781809 0.3030277 6.000 20.000
EDUC 1.0 14.0000000 35 2.4970571 0.4220797 8.000 20.000
EDUC 2.0 17.0000000 3 3.0000000 1.7320508 14.000 20.000
SES 0.0 2.3349515 206 1.0497116 0.0731369 1.000 5.000
SES 0.5 2.6818182 110 1.2186006 0.1161890 1.000 5.000
SES 1.0 2.5714286 35 1.2434703 0.2101848 1.000 5.000
SES 2.0 1.6666667 3 1.1547005 0.6666667 1.000 3.000

4 Influence of Individual Determinants Over A Time Period Of 2 Years

4.1 Box Plots

Plots are generated using ggplot2 from the tidyverse package (Wickham et al. 2019). Arrangement of plots into a grid was achieved using ggarrange from the ggpubr package (Kassambara 2020).
\label{fig:fig2}Boxplots Showing Deterimant Data At The Start And End Of The Study In Converted Patients.

Figure 4.1: Boxplots Showing Deterimant Data At The Start And End Of The Study In Converted Patients.

4.2 Summary Table

Table 4.1: Dementia2 Summary Statitics Table
Determinant Date Mean N Standard Deviation Standard Error Minimum Maximum
Age Start 76.1666667 12 7.7440104 2.2355032 65.000 86.000
Age End 78.8333333 12 7.0817734 2.0443319 67.000 88.000
MMSE Start 29.3333333 12 0.9847319 0.2842676 27.000 30.000
MMSE End 28.0000000 12 2.0889319 0.6030227 24.000 30.000
eTIV Start 1437.9166667 12 143.7722051 41.5034607 1264.000 1704.000
eTIV End 1446.2500000 12 150.2864689 43.3839666 1275.000 1722.000
nWBV Start 0.7402500 12 0.0350691 0.0101236 0.693 0.799
nWBV End 0.7284167 12 0.0361398 0.0104327 0.677 0.788
ASF Start 1.2311667 12 0.1176001 0.0339482 1.030 1.388
ASF End 1.2250000 12 0.1218345 0.0351706 1.019 1.376
EDUC Start 15.5000000 12 2.5761141 0.7436601 12.000 20.000
EDUC End 15.5000000 12 2.5761141 0.7436601 12.000 20.000
SES Start 1.8333333 12 1.0298573 0.2972942 1.000 4.000
SES End 1.8333333 12 1.0298573 0.2972942 1.000 4.000

5 Dementia Grouping Questionnaire

In addition, a LDA model and a questionnaire whose responses are fed into the model can be found here: dementia_grouping_questionnaire.Rmd. The unique packages used in this are as follows: caret (Kuhn 2020), MASS (Venables and Ripley 2002) , shiny (Chang et al. 2020) and shinyforms (Attali, n.d.). Explanation of package use can be found in the linked Rmd file. The model is trained to predict dementia grouping (demented or nondemented).

6 Discussion

While this data set is insightful for a large range of medical and social determinants, it is in many ways limited. This is partly due to its longitudinal nature, data of this kind is time consuming to gather and getting willing participants is tricky. As a result, participants shared a common theme in this willingness, reducing generalizability to the whole population. All participants were right handed, there is conflicting evidence as to whether this increases (Ryan, Kreiner, and Paolo 2020) or decreases (Leon et al. 1986) incidence of dementia onset (caused by Alzheimer’s disease). Either way this also reduce generalizability. The data set included 3 people with a CDR of 2.0, 0 with a CDR of 3.0/4.0 and 351 with a CDR of less than 2.0. Issues with consent may mean its harder to get participants with moderate to severe dementia. This led to some unexpected results, increased age was shown to not significantly increase CDR (p value = 0.44) but this contradicts studies which have shown age exponentially increases risk up to 90. (Jorm and Jolley 1998). Table 3.1 shows a higher age in the CDR 2.0 bracket but the small sample size means this is not demonstrated by 3.1. Patient data could be expanded to include other determinants or to break down determinants used in this analysis. For example, it has been speculated age risk is due to associated factors such as higher blood pressure, changes to cell structure or the weakening of body repair systems. This workflow could be altered to include these by changing columns used where necessary (statistics, plots, summary tables, data tidying). Size affected dementia2 work due to the small converted sample size (24). To get around this visit data was converted to having two levels (start and end) rather than multiple (visit 1, 2 , 3 etc). So while it identifies determinants that led to an increase over time, it does not specify any time length, this is another area which could be looked at further with more data. Overall having a longitudinal study allowed temporal aspects of dementia onset to be considered but led to limited data collection.

7 Word Count

Word count is calculated using wordcountaddin (Marwick 2020).

This rmd script: 1245
The dementia grouping questionnaire script: 298
The README: 295
Total: 1838

References

Allaire, JJ, Yihui Xie, Jonathan McPherson, Javier Luraschi, Kevin Ushey, Aron Atkins, Hadley Wickham, Joe Cheng, Winston Chang, and Richard Iannone. 2020. Rmarkdown: Dynamic Documents for r. https://github.com/rstudio/rmarkdown.
Attali, Dean. n.d. “Shinyforms.” Github.
Baumgart, Matthew, Heather M Snyder, Maria C Carrillo, Sam Fazio, Hye Kim, and Harry Johns. 2015. “Summary of the Evidence on Modifiable Risk Factors for Cognitive Decline and Dementia: A Population-Based Perspective.” Alzheimers. Dement. 11 (6): 718–26.
Burns, Alistair, and Steve Iliffe. 2009. “Dementia.” BMJ 338 (February): b75.
Chang, Winston, Joe Cheng, JJ Allaire, Yihui Xie, and Jonathan McPherson. 2020. Shiny: Web Application Framework for r. https://CRAN.R-project.org/package=shiny.
Firke, Sam. 2020. Janitor: Simple Tools for Examining and Cleaning Dirty Data. https://CRAN.R-project.org/package=janitor.
Jorm, A F, and D Jolley. 1998. “The Incidence of Dementia: A Meta-Analysis.” Neurology 51 (3): 728–33.
Kassambara, Alboukadel. 2020. Ggpubr: ’ggplot2’ Based Publication Ready Plots. https://CRAN.R-project.org/package=ggpubr.
Kuhn, Max. 2020. Caret: Classification and Regression Training. https://CRAN.R-project.org/package=caret.
Leon, M J de, M E la Regina, S H Ferris, C I Gentes, and J D Miller. 1986. “Reduced Incidence of Left-Handedness in Clinically Diagnosed Dementia of the Alzheimer Type.” Neurobiol. Aging 7 (3): 161–64.
Marwick, Ben. 2020. Wordcountaddin: Word Counts and Readability Statistics in r Markdown Documents.
R Core Team. 2020. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Ryan, Joseph J, David S Kreiner, and Anthony M Paolo. 2020. “Handedness of Healthy Elderly and Patients with Alzheimer’s Disease.” Int. J. Neurosci. 130 (9): 875–83.
Venables, W. N., and B. D. Ripley. 2002. Modern Applied Statistics with s. Fourth. New York: Springer. http://www.stats.ox.ac.uk/pub/MASS4.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, and Jennifer Bryan. 2019. Readxl: Read Excel Files. https://CRAN.R-project.org/package=readxl.
Xie, Yihui. 2016. Bookdown: Authoring Books and Technical Documents with R Markdown. Boca Raton, Florida: Chapman; Hall/CRC. https://github.com/rstudio/bookdown.
Zhu, Hao. 2020. kableExtra: Construct Complex Table with ’kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.